Data

The next table displays the first six rows of the niveau data set.

The data set collects the Aare’s daily maximum water levels in one unique station in Stilli, Untersiggenthal (canton of Aargau) and records the exact times at which daily maximal values are detected.

Stationsname Stationsnummer Parameter Zeitreihe Parametereinheit Gewässer Zeitstempel Zeitpunkt_des_Auftretens Wert Freigabestatus
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-01 00:00:00 2000-01-01 00:23:00 326.245 Freigegeben, validierte Daten
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-02 00:00:00 2000-01-02 00:43:10 326.153 Freigegeben, validierte Daten
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-03 00:00:00 2000-01-03 00:00:00 326.053 Freigegeben, validierte Daten
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-04 00:00:00 2000-01-04 01:43:40 325.871 Freigegeben, validierte Daten
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-05 00:00:00 2000-01-05 21:23:00 325.837 Freigegeben, validierte Daten
Untersiggenthal, Stilli 2205 Pegel Tagesmaxima m ü.M. Aare 2000-01-06 00:00:00 2000-01-06 03:33:05 325.835 Freigegeben, validierte Daten

Time Series: Water Levels

The below plot shows the evolution of water levels over 21 years from 01-01-2000 to 01-08-2021.

First, we observe a pattern indicating that water levels fluctuate a lot within a year (i.e. seasonality?). Second, we see that the lowest troughs are quite constant over the years (~325.5 meter above sea) whereas the highest peaks really distinguish themselves among the overall peaks. Indeed, we observe that peaks are mainly observed in the Summer months with the highest peaks recorded on 25-08-2005 at 328.827 [meter above sea], on 09-06-2007 at 329.323 [meter above sea] and on 14-07-2021 at 328.622 [meter above sea]. Therefore, over 21 years three years have had extremely and unusually high water levels during the summer, respectively 2005, 2007 and 2021.

Value Distribution

The following output shows the five-number statistics summary of the water level values.

The summary shows that the water level values are not symmetrical around the mean or the median but much more tightly grouped on the lefts side of the mean (and the median) than on its right side (i.e. more dispersed), indicating that the water levels’ distribution is right-skewed. The second observation is that the difference between the minimum and maximum water level is not very large.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   325.4   325.6   325.8   325.9   326.1   329.3

The below histogram (i.e. frequency distribution) confirms the above observations. Indeed, we see that the data is right-skewed (i.e. therefore non-normal) indicating that the data is disproportionately distributed on the right where water levels happen to have much higher values than usual (i.e. outliers) that need to be investigated.

The red curve represents the smoothed version of the histogram which reflects more accurately the probability density function of the values. The purple vertical line draws the median water level value, 325.8 meter above sea, and the green vertical line draws the average, 325.9. [ADD THE MODE]

Looking at the positions on the x-axis of the mean and the median [and the mode] with regard to the distribution, we see that the mean seems to be a better indicator of the center of the distribution. [adapt interpretation when mode is on].

[IMPROVE GRAPH: better color/theme + add legend ]

Risk Assessment

Peaks-over-Threshold Model

The Peaks-over-Threshold method identifies values that are high enough to be above a designated threshold \(\mu\), to show the extremes (i.e. highest values). In order to determine an optimal threshold, we apply the MRL-plot and then look at the distribution of the data points [exceeding the threshold right ?]. [The number of exceedances arise according to a Poisson distribution with parameter lambda (i.e. probability of getting an exceedance)].

The value of \(\mu\) above which the plot is approximately linear can generally be selected as the optimal threshold. [not sure I understand this sentence] Therefore, to model the high water levels, we first proceed to an MRL-plot to choose the optimal threshold \(\mu\) and then, we use the Peaks-over-Threshold method [we look at the distribution of the exceedances to assess their probability of occurrence].

Clustering of the extremes

Clusters of the extremes correspond to the clustering of the data points that are above the chosen threshold u. Consecutive threshold exceedances are considered to belong to the same cluster. In our case, concerning the daily water levels data, by using the Peak-over-Threshold approach we can observe thanks to the plot the different clusters of the extreme values, then we can fit the Generalized Pareto Distribution (GPD) or Point Process Model to the cluster maxima (after declustering if the exceedances exhibit autocorrrelation).

  • [drawbacks and advantages of using block maxima method instead ]

Analysis

The threshold could be put at around 326 or 327.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   325.4   325.6   325.8   325.9   326.1   329.3

GPD: 1st way of using Peak-over-Threshold data

Return Level with GDP

Point Process Model: 2nd way of using Peak-over-Threshold data

Return Level with Point Process Level

50-Year Events

100-Year Events